HLT & NLP within the Arabic world:
Arabic Language and
Status Updates and Prospects
Automatic versus interactive analysis for the massive vowelization, tagging and lemmatization of Arabic
Fathi Debili, Zied Ben Tahar,
LLACAN, INALCO, CNRS, France and Emna Souissi, ESSTT, Tunisia
Prague Arabic Dependency Treebank: A Word on the Million Words
Otakar Smrz, Viktor Bielicky, Iveta Kourilova, Jakub Kracmar, Jan Hajic, Petr Zemanek
Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic
Arabic Named Entity Recognition using Conditional Random Fields
Yassine Benajiba and Paolo Rosso,
Natural Language Engineering Lab. Departamento de Sistemas Inform´aticos y Computaci´on, Universidad Polit´ecnica de Valencia , Spain .
Can the building of corpus-based Arabic concordances with AraConc and DIINAR.1 tackle the issue of Arabic polyglossia?
Joseph Dichy and Ramzi Abbès,
Université Lumière-Lyon 2 and ICAR (CNRS-Lyon 2)
Amazigh Data Base
El Mehdi IAZZI, Mohamed OUTAHAJALA,
Institut Royal de la Culture Amazigh, Rabat, Morocco
Building an Arabic Morphological Analyzer as part of an Open Arabic NLP Platform
Lahsen Abouenour (*), Said El Hassani(**), Tawfiq Yazidy (**), Karim Bouzouba(*), Abdelfattah Hamdani(**)
(*) Mohammadia School of Engineers, (**) Institute for Studies and Research on Arabization, Rabat, Morocco
Morpho-syntactic tagging system for Arabic texts
A. Yousfi , A. El jihad, and L. Aouragh,
IERA ( Institute for Studies and Research on Arabization) , Rabat Morocco
Guidelines for Annotation of Arabic Dialectness
Nizar Habash, Owen Rambow, Mona Diab and Reem Kanjawi-Faraj
Center for Computational Learning Systems, Columbia University, New York, NY, USA
Information retrieval in Arabic language
Malek Boualem (*), Ramzi Abbes (**)
(*) France Télécom Orange Labs, France; (**) Lyon 2 University / ICAR-CNRS, France
Memory-Based Vocalization of Arabic
Sandra Kübler, Emad Mohamed
Indiana University , Department of Linguistics, Bloomington, IN-47405, USA
Towards a human-machine spoken dialogue in Arabic
Younes Bahou, Lamia Hadrich Belguith, and Abdelmajid BEN HAMADOU
LARIS - MIRACL Laboratory, Faculty of Economic Sciences and Management of Sfax, Sfax , Tunisia
Methods for porting NL-based restricted e-commerce systems into other languages
Najeh Hajlaoui (*), Daoud Maher Daoud (**), Christian Boitet (*)
(*)GETALP, LIG,Université Joseph Fourier, Grenoble , France
(**) Amman University , Amman Jordan
Automatic Pronunciation Dictionary Toolkit for Arabic
Hussein Hiyassat(*), Mustafa Yaseen(**), Nihad Arabiat(***)
(*) e-Prucurment Project, UNDP, (**) Amman University , (***) Ministry of Education ; Amman , Jordan
Broadcast News Transcription Baseline System using the NEMLAR database
R. Bayeh (*,**), C. Mokbel (**), G. Chollet (*)
(*) TELECOM-ParisTech, CNRS-LTCI UMR-5141, Paris , France ; (**) University of Balamand , Tripoli , Lebanon
Arabic-English translation improvement by target-side neural network language modeling
Maxim Khalilov(*), José A. R. Fonollosa(*), F. Zamora-Martínez(**), María J. Castro-Bleda(**), S. España-Boquera(**)
(*) Centre de Recerca TALP, Universitat Politècnica de Catalunya Barcelona, Spain; (**) Dep. de Lenguajes y Sistemas Informáticos, Universidad Politécnica de Valencia, Valencia, Spain
Language modeling for local and Modern Standard Arabic
Ilana Heintz, Chris Brew
Department of Linguistics, Ohio State University , Columbus , USA
Towards a syntactic lexicon of Arabic Verbs
Noureddine LOUKIL, Kais HADDAR, Abdelmajid BEN HAMADOU
Institut Supérieur d’Informatique et Multimédia de Sfax, Tunisie
Automatic Morphological Rule Induction for Arabic
Ahmad Hany Hossny (*), Khaled Shaalan (**), Aly Fahmy (*)
(*) Faculty of Computers and Information, Cairo University , Egypt
(**) Faculty of Informatics , The British University in Dubai , Dubai , UAE
This Workshop intends to add value to the issues addressed during the main conference (Human Language Technologies (HLT) & Natural Language Processing (NLP)) and enhance the work carried out at different places to process Arabic language(s) and more generally Semitic languages and other local and foreign languages spoken in the region.
It should bring together people who are actively involved in Arabic Written and Spoken language processing in a mono- or cross/multilingual context, and give them an opportunity to update the community through reports on completed and ongoing work as well as on the availability of LRs, evaluation protocols and campaigns, products and core technologies (in particular open source ones). This should enable the participants to develop a common view on where we stand with respect to these particular set of languages and to foster the discussion of the future of this research area. Particular attention will be paid to activities involving technologies such as Machine Translation, Cross-Lingual Information Retrieval/extraction, Summarization, Speech to text transcriptions, etc., and languages such as Arabic varieties, Amazigh, Amharic, Hebrew, Maltese, and other local languages. Evaluation methodologies and resources for evaluation of HLT are also a main focus.
It is clear from the various projects that Arabic has become a major language for HLT. During this workshop we will emphasize the need to focus on specific issues that would help citizens living in Arabic countries to have access to information and technologies in their mother tongues and therefore discuss requirements to customize existing technologies for pairs of languages e.g. English to Arabic, Amazigh, etc. A particular stress will be put on tools, technologies, resources that tackle colloquial Arabic and other local languages such as Amazigh.
We expect to identify problems of common interest, and possible mechanisms to move towards solutions, such as sharing of resources, tools, standards, sharing and dissemination of information and expertise, adoption of current best practices, setting up joint projects and technology transfer mechanisms, etc.
By bringing together players in the Arabic NLP field, we would like to follow activities discussed at similar workshops (e.g. LREC2002) but also at the NEMLAR conference on Arabic Language (2004, Cairo Egypt), the workshop on Arabic NLP (Fez, April, 2007, http://www.dsic.upv.es/~prosso/workshopAECI_ArabicNLP.pdf) as well as work carried out in projects such as NET-DC, NEMLAR (www.nemlar.org) or the LDC project on the "Less Commonly Taught Languages". The objective is also to introduce activities that will be launched shortly within the MEDAR project (the follow-up of NEMLAR project under FP7 of the European Commission). Among the crucial issues that require particular attention is the construction/update of a broadly supported Roadmap for these languages in relationship with Multilinguality and Evaluation of HLTs.
Topics of Interest
The submissions should address some of the LREC issues that are specific and of paramount importance to the Arabic resources and evaluation; some of these issues are:
· Issues in the design, the acquisition, creation, management, access, distribution, use of Language Resources (Standard Arabic, Colloquial Arabic, other Semitic languages, Amazigh, Coptic, Maltese, English/French spoken locally, etc.)
· Impact on LR collections/processing and NLP of the crucial issues related to "code switching" between different dialects and languages
· Specific issues related to the above-mentioned languages such as role of morphology, named entities, corpus alignment, etc.)
· Multilinguality issues including relationship between Colloquial and Standard Arabic
· Exploitation of LR in different types of applications
· Industrial LR requirements and community's response;
· Benchmarking of systems and products; resources for benchmarking and evaluation for written and spoken language processing;
· Focus on some key technologies such as MT (all approaches e.g. Statistical, Example-Based, etc.), Information Retrieval, Speech Recognition, Spoken Documents Retrieval, CLIR, Question-Answering, Summarization,
· Local, regional, and international activities and projects;
· Needs, possibilities, forms, initiatives of/for regional and international cooperation.
Format of the Workshop
It will be a full-day workshop. The workshop is not intended to be a mini-conference, but as a real workshop aiming at concrete results that should clarify the situation of Arabic with respect to Language Resources and Evaluation. Sessions will include introductory speeches, invited talks, a small number of refereed presentations, etc.
Workshop chair
Khalid Choukri (ELRA/ELDA,
Workshop Co-chairs
Mona Diab,
Bente Maegaard (CST,
Paolo Rosso, Universidad Politécnica
Abdelhadi Soudi ENIM
(
Ali Farghaly, Oracle
Program and Scientific Committee (tentative)
Ken Beesley , Xerox Research Centre Europe, France
Malek Boualem , France Telecom Orange Labs (
Tim Buckwalter ,
Violetta Cavalli-Sforza, San Francisco State
University (USA)
Achraf Chalabi , Sakhr
(
Khalid
Choukri, ELRA/ELDA (
Christopher Cieri, Linguistic
Data Consortium, Philadelphia, (USA)
Fathi Debili, CELLMA - ENS LSH Lyon (France)
Mona
Diab,
Joseph Dichy,
Everhard Ditters,
Khaled Elghamry, (
Ossama Emam, IBM (Egypt)
Ali Farghaly, Oracle
Abdelkader Fassi-Fehri,
Gregory Grefenstette, LIC2M/CEA-LIST, (
Ahmed
Guessoum,
Nizar Habash,
Mohamed
Hassoun, ENSIB, Lyon (
Steven
Krauwer,
ELSNET and
Mohamed
Maamouri, LDC,
Bente
Maegaard, CST,
John Makhoul, BBN
Technologies, GTE Corp (USA)
Chafic Mokbel,
Abdelhak Mouradi, ENSIAS (Morocco)
Owen
Rambow,
Mohsen Rashwan, RDI (Egypt)
Horacio Rodríguez, Universitat Politècnica Catalunya, (
Paul
Roochnick, -Apptek, (
Mike Rosner,
Paolo
Rosso, Universidad Politécnica
Salim Roukos,
Jean Senellart, SYSTRAN (France)
Abdelhadi Soudi,
ENIM (Morocco)
Mustafa
Yassen,